fix: remove panic() from all reconciliation and CLI paths#477
Draft
flemzord wants to merge 29 commits into
Draft
fix: remove panic() from all reconciliation and CLI paths#477flemzord wants to merge 29 commits into
flemzord wants to merge 29 commits into
Conversation
The ObjectMutator signature already propagates errors; panicking on SetOwnerReference failure would crash the whole controller-manager instead of failing the single reconciliation.
GetAllStackDependencies already returns an error; panicking on a FromUnstructured conversion failure would crash the controller-manager on a single malformed resource instead of failing one reconciliation.
GetAs already returns an error; panicking on json.Marshal failure would crash the controller-manager instead of failing the single settings lookup.
The CreateOrUpdate mutate closure already returns an error; panicking would crash the controller-manager instead of failing the single ResourceReference reconciliation.
The enclosing reconcile helper already returns an error; panicking on a hashing failure would crash the controller-manager instead of failing the single Benthos reconciliation.
UnmarshalJSON runs while decoding user-provided custom resources: a single CR carrying an unparseable URL would crash the whole controller-manager. Returning the error fails only the decode of the offending object.
Registering the URI semantic-equality function can only fail on a programmer error and must abort startup; utilruntime.Must is the established idiom for this (already used in cmd/main.go) and reports the failure through the k8s error handlers instead of a bare panic.
Same idiom as cmd/main.go: scheme registration failure is a programmer error that must abort startup; utilruntime.Must reports it through the k8s error handlers instead of a bare panic.
CopyDir has no callers outside its own recursion and contained two panic() calls on filesystem errors. Removing it eliminates both panic sites.
…s in WithWatchVersions A transient API-server error while listing stacks in the Versions watch handler would crash the whole controller-manager. Log the error and drop the event instead: the next Versions event re-triggers the mapping.
… in WithWatchVersions ObjectKinds failing for the watch target is a registration problem; crashing the controller-manager from an event handler hides the root cause. Log the error and drop the event instead.
…thWatchVersions A transient list error for one stack would crash the whole controller-manager and lose the events of every other stack. Log the error and continue with the remaining stacks.
…Func Same rationale as the other WithWatchVersions handlers: an event handler must not crash the controller-manager on a kind-resolution error.
A kind-resolution failure (unregistered type) would crash the whole controller-manager from any module deployment helper. The function now returns the error; the eleven callers propagate it through their existing error paths, and the databases watch handler logs it and drops the event.
Same rationale as LowerCamelCaseKind: kind-resolution failure must not crash the controller-manager. Also reuse the already-computed object name in gatewayhttpapis.Create instead of resolving the kind twice.
An unknown broker mode in a Broker status would crash the whole controller-manager from any module deployment helper. The function now returns an error that the five callers propagate through their existing error paths, failing only the offending reconciliation.
Converting a module's unstructured content to read its status would panic and crash the controller-manager on a single malformed module. The condition-building closure now returns the error (after marking the condition false) and the reconciliation fails normally.
…atchResource A kind-resolution error in the watch map function would crash the whole controller-manager. Log the error and drop the event instead. Also hoist the loop-invariant GVK lookup out of the owner-references loop.
createDeployment already has an error path; a hashing failure now fails the single Auth reconciliation instead of crashing the controller-manager.
A hashing failure would crash the whole controller-manager from any of the three callers (benthosstreams, auths, caddy). The function now returns the error and callers propagate it through their existing error paths.
…king The templates map was built in an immediately-invoked closure that panicked on embedded-FS read errors, crashing the controller-manager. Extract it into a buildTemplates helper that returns the error so the single Benthos reconciliation fails instead.
The function already returns an error; a CLI must report failures through its normal error path, not a panic stack trace.
…king Same rationale as enable: report failures through the command's normal error path.
…cking Same rationale as enable: report failures through the command's normal error path.
…icking Same rationale as enable: report failures through the command's normal error path.
…anicking Same rationale as enable: report failures through the command's normal error path.
… panicking A failure to unlock stacks after an upgrade is exactly the situation the user must be told about cleanly; join it with the command error instead of panicking past the cobra error handling.
Register the scheme in main() and exit with a printed error instead of panicking from init(), keeping all CLI failures on the same stderr+exit-code path.
Contributor
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The project review (see
review.md, finding C2/C3 and friends) identified 31panic()call sites in production code. In a controller-runtime operator, any panic in a reconcile path, watch event handler, or shared helper crashes the whole controller-manager: a single malformed resource or transient API-server error takes down reconciliation for every stack in the cluster.This PR removes every production
panic()— 30 sites. The only one left is ininternal/tests/internal/bootstrap.go(test bootstrap, where failing fast is idiomatic and Ginkgo owns the process).One commit per panic fix so each change can be reviewed and reverted independently (panics fixed by a single coherent change — e.g. the two embedded-FS reads inside one closure — share a commit).
Approach by panic class
return err, wrapped where useful)HashFromConfigMaps,LowerCamelCaseKind,LowerCaseKind,GetPublisherEnvVars,HashFromHash)(T, error); all callers updated to propagate through their existing error pathslog.FromContext(ctx)and drop the event; per-stack loopscontinueinstead of dying so one bad stack doesn't lose the others' eventsinit()-time scheme/equality registrationutilruntime.Must(...), the idiom already used incmd/main.goURI.UnmarshalJSONcore.CopyDir, no callers)upgradeis joined into the command error viaerrors.JoinNotable details
internal/resources/resourcereferences/init.go: the loop-invariantGVKForObjectlookup was also hoisted out of the owner-references loop.internal/resources/gatewayhttpapis/create.go: the kind was resolved twice for the same value; the second call now reuses the first result.internal/resources/stacks/init.go: the condition-building closure now returns the conversion error after marking the module conditionFalse("Unable to read module status"), preserving the deferred condition write.Verification
go build ./...andgo vet ./...pass on the main module and thetools/kubectl-stacksmodule (run after each commit for the touched package, plus a full pass at the end).make test: all 8 Ginkgo envtest suites pass.